quadruped robot
Beyond Egocentric Limits: Multi-View Depth-Based Learning for Robust Quadrupedal Locomotion
Recent progress in legged locomotion has allowed highly dynamic and parkour-like behaviors for robots, similar to their biological counterparts. Yet, these methods mostly rely on egocentric (first-person) perception, limiting their performance, especially when the viewpoint of the robot is occluded. A promising solution would be to enhance the robot's environmental awareness by using complementary viewpoints, such as multiple actors exchanging perceptual information. Inspired by this idea, this work proposes a multi-view depth-based locomotion framework that combines egocentric and exocentric observations to provide richer environmental context during agile locomotion. Using a teacher-student distillation approach, the student policy learns to fuse proprioception with dual depth streams while remaining robust to real-world sensing imperfections. To further improve robustness, we introduce extensive domain randomization, including stochastic remote-camera dropouts and 3D positional perturbations that emulate aerial-ground cooperative sensing. Simulation results show that multi-viewpoints policies outperform single-viewpoint baseline in gap crossing, step descent, and other dynamic maneuvers, while maintaining stability when the exocentric camera is partially or completely unavailable. Additional experiments show that moderate viewpoint misalignment is well tolerated when incorporated during training. This study demonstrates that heterogeneous visual feedback improves robustness and agility in quadrupedal locomotion. Furthermore, to support reproducibility, the implementation accompanying this work is publicly available at https://anonymous.4open.science/r/multiview-parkour-6FB8
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > Canada > Quebec > Estrie Region > Sherbrooke (0.04)
- North America > United States > Florida > Broward County > Fort Lauderdale (0.04)
MobileVLA-R1: Reinforcing Vision-Language-Action for Mobile Robots
Huang, Ting, Li, Dongjian, Yang, Rui, Zhang, Zeyu, Yang, Zida, Tang, Hao
Grounding natural-language instructions into continuous control for quadruped robots remains a fundamental challenge in vision language action. Existing methods struggle to bridge high-level semantic reasoning and low-level actuation, leading to unstable grounding and weak generalization in the real world. To address these issues, we present MobileVLA-R1, a unified vision-language-action framework that enables explicit reasoning and continuous control for quadruped robots. We construct MobileVLA-CoT, a large-scale dataset of multi-granularity chain-of-thought (CoT) for embodied trajectories, providing structured reasoning supervision for alignment. Built upon this foundation, we introduce a two-stage training paradigm that combines supervised CoT alignment with GRPO reinforcement learning to enhance reasoning consistency, control stability, and long-horizon execution. Extensive evaluations on VLN and VLA tasks demonstrate superior performance over strong baselines, with approximately a 5% improvement. Real-world deployment on a quadruped robot validates robust performance in complex environments. Code: https://github.com/AIGeeksGroup/MobileVLA-R1. Website: https://aigeeksgroup.github.io/MobileVLA-R1.
MLM: Learning Multi-task Loco-Manipulation Whole-Body Control for Quadruped Robot with Arm
Liu, Xin, Ma, Bida, Qi, Chenkun, Ding, Yan, Xu, Nuo, Zhaxizhuoma, null, Zhang, Guorong, Chen, Pengan, Liu, Kehui, Jia, Zhongjie, Guan, Chuyue, Mo, Yule, Liu, Jiaqi, Gao, Feng, Zhong, Jiangwei, Zhao, Bin, Li, Xuelong
Whole-body loco-manipulation for quadruped robots with arms remains a challenging problem, particularly in achieving multi-task control. To address this, we propose MLM, a reinforcement learning framework driven by both real-world and simulation data. It enables a six-DoF robotic arm-equipped quadruped robot to perform whole-body loco-manipulation for multiple tasks autonomously or under human teleoperation. To address the problem of balancing multiple tasks during the learning of loco-manipulation, we introduce a trajectory library with an adaptive, curriculum-based sampling mechanism. This approach allows the policy to efficiently leverage real-world collected trajectories for learning multi-task loco-manipulation. To address deployment scenarios with only historical observations and to enhance the performance of policy execution across tasks with different spatial ranges, we propose a Trajectory-Velocity Prediction policy network. It predicts unobservable future trajectories and velocities. By leveraging extensive simulation data and curriculum-based rewards, our controller achieves whole-body behaviors in simulation and zero-shot transfer to real-world deployment. Ablation studies in simulation verify the necessity and effectiveness of our approach, while real-world experiments on a Go2 robot with an Airbot robotic arm demonstrate the policy's good performance in multi-task execution.
- Asia > China > Shanghai > Shanghai (0.05)
- North America > United States > New York > Broome County > Binghamton (0.04)
X-IONet: Cross-Platform Inertial Odometry Network with Dual-Stage Attention
Learning-based inertial odometry has achieved remarkable progress in pedestrian navigation. However, extending these methods to quadruped robots remains challenging due to their distinct and highly dynamic motion patterns. Models that perform well on pedestrian data often experience severe degradation when deployed on legged platforms. To tackle this challenge, we introduce X-IONet, a cross-platform inertial odometry framework that operates solely using a single Inertial Measurement Unit (IMU). X-IONet incorporates a rule-based expert selection module to classify motion platforms and route IMU sequences to platform-specific expert networks. The displacement prediction network features a dual-stage attention architecture that jointly models long-range temporal dependencies and inter-axis correlations, enabling accurate motion representation. It outputs both displacement and associated uncertainty, which are further fused through an Extended Kalman Filter (EKF) for robust state estimation. Extensive experiments on public pedestrian datasets and a self-collected quadruped robot dataset demonstrate that X-IONet achieves state-of-the-art performance, reducing Absolute Trajectory Error (ATE) by 14.3% and Relative Trajectory Error (RTE) by 11.4% on pedestrian data, and by 52.8% and 41.3% on quadruped robot data. These results highlight the effectiveness of X-IONet in advancing accurate and robust inertial navigation across both human and legged robot platforms.
Learning Omnidirectional Locomotion for a Salamander-Like Quadruped Robot
Liu, Zhiang, Liu, Yang, Fang, Yongchun, Guo, Xian
Salamander-like quadruped robots are designed inspired by the skeletal structure of their biological counterparts. However, existing controllers cannot fully exploit these morphological features and largely rely on predefined gait patterns or joint trajectories, which prevents the generation of diverse and flexible locomotion and limits their applicability in real-world scenarios. In this paper, we propose a learning framework that enables the robot to acquire a diverse repertoire of omnidirectional gaits without reference motions. Each body part is controlled by a phase variable capable of forward and backward evolution, with a phase coverage reward to promote the exploration of the leg phase space. Additionally, morphological symmetry of the robot is incorporated via data augmentation, improving sample efficiency and enforcing both motion-level and task-level symmetry in learned behaviors. Extensive experiments show that the robot successfully acquires 22 omnidirectional gaits exhibiting both dynamic and symmetric movements, demonstrating the effectiveness of the proposed learning framework.
Stable and Robust SLIP Model Control via Energy Conservation-Based Feedback Cancellation for Quadrupedal Applications
Hassan, Muhammad Saud Ul, Vasquez, Derek, Asif, Hamza, Hubicki, Christian
In this paper, we present an energy-conservation based control architecture for stable dynamic motion in quadruped robots. We model the robot as a Spring-loaded Inverted Pendulum (SLIP), a model well-suited to represent the bouncing motion characteristic of running gaits observed in various biological quadrupeds and bio-inspired robotic systems. The model permits leg-orientation control during flight and leg-length control during stance, a design choice inspired by natural quadruped behaviors and prevalent in robotic quadruped systems. Our control algorithm uses the reduced-order SLIP dynamics of the quadruped to track a stable parabolic spline during stance, which is calculated using the principle of energy conservation. Through simulations based on the design specifications of an actual quadruped robot, Ghost Robotics Minitaur, we demonstrate that our control algorithm generates stable bouncing gaits. Additionally, we illustrate the robustness of our controller by showcasing its ability to maintain stable bouncing even when faced with up to a 10% error in sensor measurements.
Stand, Walk, Navigate: Recovery-Aware Visual Navigation on a Low-Cost Wheeled Quadruped
Wheeled-legged robots combine the efficiency of wheels with the obstacle negotiation of legs, yet many state-of-the-art systems rely on costly actuators and sensors, and fall-recovery is seldom integrated, especially for wheeled-legged morphologies. This work presents a recovery-aware visual-inertial navigation system on a low-cost wheeled quadruped. The proposed system leverages vision-based perception from a depth camera and deep reinforcement learning policies for robust locomotion and autonomous recovery from falls across diverse terrains. Simulation experiments show agile mobility with low-torque actuators over irregular terrain and reliably recover from external perturbations and self-induced failures. We further show goal directed navigation in structured indoor spaces with low-cost perception. Overall, this approach lowers the barrier to deploying autonomous navigation and robust locomotion policies in budget-constrained robotic platforms.
Online Object-Level Semantic Mapping for Quadrupeds in Real-World Environments
Razavi, Emad, Bratta, Angelo, Soares, João Carlos Virgolino, Recchiuto, Carmine, Semini, Claudio
Abstract--We present an online semantic object mapping system for a quadruped robot operating in real indoor environments, turning sensor detections into named objects in a global map. During a run, the mapper integrates range geometry with camera detections, merges co-located detections within a frame, and associates repeated detections into persistent object instances across frames. Objects remain in the map when they are out of view, and repeated sightings update the same instance rather than creating duplicates. The output is a compact object layer that can be queried (class, pose, and confidence), is integrated with the occupancy map and readable by a planner . In on-robot tests, the layer remained stable across viewpoint changes.
Proprioceptive Image: An Image Representation of Proprioceptive Data from Quadruped Robots for Contact Estimation Learning
Abati, Gabriel Fischer, Soares, João Carlos Virgolino, Turrisi, Giulio, Barasuol, Victor, Semini, Claudio
Abstract-- This paper presents a novel approach for representing proprioceptive time-series data from quadruped robots as structured two-dimensional images, enabling the use of convolutional neural networks for learning locomotion-related tasks. The proposed method encodes temporal dynamics from multiple proprioceptive signals, such as joint positions, IMU readings, and foot velocities, while preserving the robot's morphological structure in the spatial arrangement of the image. We apply this concept in the problem of contact estimation, a key capability for stable and adaptive locomotion on diverse terrains. Experimental evaluations on both real-world datasets and simulated environments show that our image-based representation consistently enhances prediction accuracy and generalization over conventional sequence-based models, underscoring the potential of cross-modal encoding strategies for robotic state learning. Our method achieves superior performance on the contact dataset, improving contact state accuracy from 87.7% to 94.5% over the recently proposed MI-HGNN method, using a 15 times shorter window size. I. INTRODUCTION Deep learning has achieved remarkable success in domains where sequential or high-dimensional data can be transformed into visual representations suitable to convolu-tional architectures. In speech recognition, for instance, raw audio signals are often converted into spectrograms, two-dimensional time-frequency images, that serve as powerful inputs to convolutional neural networks (CNNs) and other image-based models.
- Europe > Switzerland > Basel-City > Basel (0.04)
- Europe > Italy (0.04)
QuaDreamer: Controllable Panoramic Video Generation for Quadruped Robots
Wu, Sheng, Teng, Fei, Shi, Hao, Jiang, Qi, Luo, Kai, Wang, Kaiwei, Yang, Kailun
Panoramic cameras, capturing comprehensive 360-degree environmental data, are suitable for quadruped robots in surrounding perception and interaction with complex environments. However, the scarcity of high-quality panoramic training data-caused by inherent kinematic constraints and complex sensor calibration challenges-fundamentally limits the development of robust perception systems tailored to these embodied platforms. To address this issue, we propose QuaDreamer-the first panoramic data generation engine specifically designed for quadruped robots. QuaDreamer focuses on mimicking the motion paradigm of quadruped robots to generate highly controllable, realistic panoramic videos, providing a data source for downstream tasks. Specifically, to effectively capture the unique vertical vibration characteristics exhibited during quadruped locomotion, we introduce Vertical Jitter Encoding (VJE). VJE extracts controllable vertical signals through frequency-domain feature filtering and provides high-quality prompts. To facilitate high-quality panoramic video generation under jitter signal control, we propose a Scene-Object Controller (SOC) that effectively manages object motion and boosts background jitter control through the attention mechanism. To address panoramic distortions in wide-FoV video generation, we propose the Panoramic Enhancer (PE)-a dual-stream architecture that synergizes frequency-texture refinement for local detail enhancement with spatial-structure correction for global geometric consistency. We further demonstrate that the generated video sequences can serve as training data for the quadruped robot's panoramic visual perception model, enhancing the performance of multi-object tracking in 360-degree scenes. The source code and model weights will be publicly available at https://github.com/losehu/QuaDreamer.
- Asia > China (0.04)
- North America > United States > Oklahoma > Beaver County (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)